Search CORE

4 research outputs found

Multi-objective evolutionary optimization for dimensionality reduction of texts represented by synsets

Author: Basto-Fernandes V.
Ezpeleta E.
Gómez-Meire S.
Méndez J. R.
Velez de Mendizabal I.
Zurutuza U.
Publication venue: 'PeerJ'
Publication date: 01/01/2023
Field of study

Despite new developments in machine learning classification techniques, improving the accuracy of spam filtering is a difficult task due to linguistic phenomena that limit its effectiveness. In particular, we highlight polysemy, synonymy, the usage of hypernyms/hyponyms, and the presence of irrelevant/confusing words. These problems should be solved at the pre-processing stage to avoid using inconsistent information in the building of classification models. Previous studies have suggested that the use of synset-based representation strategies could be successfully used to solve synonymy and polysemy problems. Complementarily, it is possible to take advantage of hyponymy/hypernymy-based to implement dimensionality reduction strategies. These strategies could unify textual terms to model the intentions of the document without losing any information (e.g., bringing together the synsets “viagra”, “ciallis”, “levitra” and other representing similar drugs by using “virility drug” which is a hyponym for all of them). These feature reduction schemes are known as lossless strategies as the information is not removed but only generalised. However, in some types of text classification problems (such as spam filtering) it may not be worthwhile to keep all the information and let dimensionality reduction algorithms discard information that may be irrelevant or confusing. In this work, we are introducing the feature reduction as a multi-objective optimisation problem to be solved using a Multi-Objective Evolutionary Algorithm (MOEA). Our algorithm allows, with minor modifications, to implement lossless (using only semantic-based synset grouping), low-loss (discarding irrelevant information and using semantic-based synset grouping) or lossy (discarding only irrelevant information) strategies. The contribution of this study is two-fold: (i) to introduce different dimensionality reduction methods (lossless, low-loss and lossy) as an optimization problem that can be solved using MOEA and (ii) to provide an experimental comparison of lossless and low-loss schemes for text representation. The results obtained support the usefulness of the low-loss method to improve the efficiency of classifiers.info:eu-repo/semantics/publishedVersio

Repositório Institucional do ISCTE-IUL

Using Variable Precision Rough Set for Selection and Classification of Biological Knowledge Integrated in DNA Gene Expression

Author: Calvo-Dmgz D.
Fdez-Riverola F.
Glez-Peña D.
Gálvez J. F.
Gómez-Meire S.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/12/2012
Field of study

DNA microarrays have contributed to the exponential growth of genomic and experimental data in the last decade. This large amount of gene expression data has been used by researchers seeking diagnosis of diseases like cancer using machine learning methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit biological knowledge, provided as gene sets, into the classication process by means of Variable Precision Rough Set Theory (VPRS). The proposed model is able to highlight which part of the provided biological knowledge has been important for classification. This paper presents a novel model for microarray data classification which is able to incorporate prior biological knowledge in the form of gene sets. Based on this knowledge, we transform the input microarray data into supergenes, and then we apply rough set theory to select the most promising supergenes and to derive a set of easy interpretable classification rules. The proposed model is evaluated over three breast cancer microarrays datasets obtaining successful results compared to classical classification techniques. The experimental results shows that there are not significat differences between our model and classical techniques but it is able to provide a biological-interpretable explanation of how it classifies new samples

Directory of Open Access Journals

Identification of iron ore brands by multi-component analysis and chemometric tools

Author: AK Patel
B Zhang
BV Canizo
C Distante
C Yan
D Xiao
E Grifoni
F Li
GG Arantes de Carvalho
H Zhao
JA Cohen
L Sheng
M Ostadrahimi
M Pardo
MJ Hidalgo
MWH Wang
N Gerhardt
NM Ralbovsky
P Wang
RG Brereton
S Gómez-Meire
Y Yang
Y Yang
YM Guo
ZQ Hao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Liquid chromatographic methods coupled to chemometrics: a short review to present the key workflow for the investigation of wine phenolic composition as it is affected by environmental factors

Author: A Cifuentes
A Izquierdo-Llopart
AC Olivieri
AC Pereira
AE Springer
AL Pomerantsev
AM Martinez
AN Anthemidis
B Debska
B Vandeginste
C Fotakis
C Villano
CJC Burges
D Ballabio
D Ballabio
D Cazzolino
D Granato
D Restuccia
D Serrano-Lourido
DL Massart
E Fayolle
E Funes
E Salvatore
E Villagra
EG Alves Filho
F Marini
F Rossetti
H Abdi
H Li
HE Tahir
I Cutzach
I Kapusta
I Kuzmanovski
IS Arvanitoyannis
J Dennis
J Saavedra
J Saurina
J Saurina
J Zhang
J-P Antignac
JCC Santana
JL Aleixandre-Tudo
JM Gambetta
JM González-Sáiz
JM Jurado
K Pyrzynska
K Pyrzyńska
K Zhang
L Cuadros-Rodríguez
L Holmberg
LA Berrueta
LA Berrueta
M Barker
M Ghasemi-Varnamkhasti
M Nazir
M Pacella
M Pantelić
M Perini
M Vilanova
MJ Martelo-Vidal
N Kalogeropoulos
NP Kalogiouri
NP Kalogiouri
NP Kalogiouri
NP Kalogiouri
NP Kalogiouri
P Oliveri
PL Pisano
R Gurbanov
R Ragone
RG Brereton
RG Dambergs
RZ Morawski
S Esslinger
S Gómez-Meire
S Medina
S Mika
S-Y Li
T Mehmood
V Canuti
V Garcia-Canas
V Vapnik
X-Z Hu
Y Xu
Z Liu
Z Xiao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref